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Abstract. We argue that abstract datatypes — with public interfaces 
hiding private implementations — represent a form of codata rather than 
ordinary data, and hence that proof methods for corecursive programs are 
the appropriate techniques to use for reasoning with them. In particular, 
we show that the universal properties of unfold operators are perfectly 
suited for the task. We illustrate with the solution to a problem in the 
recent literature. 

1 Introduction 

Dijkstra [10] argued that the single most important contribution computing 
science has made to the world is the emphasis on designing abstractions in 
order to manage complexity. Abstract datatypes [30] — with public interfaces 
hiding private implementations — have a pivotal role to play in that contribution. 
Nevertheless, the use of abstract datatypes is not as common among functional 
programmers (particularly those using languages like Haskell, which does not 
have first-class modules) as one might expect from history's lesson. One reason 
for this phenomenon might be the seductive attractions of pattern matching over 
algebraic datatypes [55] , which seem to rely on making visible the representation 
of data and hence breaking the encapsulation; we return to this point in Section 5. 
But another reason for the underuse of abstract datatypes, we feel, is that they are 
not subject to the familiar proof methods of equational reasoning and induction 
to which functional programming so readily lends itself [2] . 

The essential reason why standard proof techniques are inapplicable to abstract 
datatypes is that they are a form of codata rather than a form of data, with an 
emphasis on observation rather than construction, process rather than value, and 
the indefinite rather than the finite. In this paper, we argue that the appropriate 
proof methods for reasoning about abstract datatypes are those associated with 
corecursive programs [15]. In particular, building on established work on final 
coalgebra semantics for object-oriented programs, we show that the universal 
properties of unfold operators [16] fit the bill very nicely. 

Our use here of unfold operators, and hence of possibly-infinite data structures, 
pushes us towards lazy rather than eager functional programming. That works 
out nicely, because it is the lazy functional programmer who tends to place 
greater emphasis on equational reasoning. On the other hand, aficionados of 
ML at least have a powerful module facility at their disposal, and so might be 
expected to make greater use of abstract datatypes than do adherents of Haskell. 
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We illustrate our case with the solution to a problem of reasoning with abstract 
datatypes from the recent literature, concerning the elimination of redundant 
conversions to and from lists in a stream-based reimplementation of the Haskell 
standard list library [9]. 

The remainder of this paper is structured as follows. Section 2 explains 
the modelling of abstract datatypes using existential type quantification, and 
Section 3 discusses the corecursive proof methods appropriate for reasoning about 
such constructions. Section 4 presents the case study on Coutts et al.'s stream 
fusion. Section 5 concludes and discusses related work. 



2 Abstract types have existential type 

Abstract datatypes are "a kind of data abstraction where a type's internal form 
is hidden behind a set of access functions; values of the type are created and 
inspected only by calls to the access functions" [22] . Hiding of the internal form 
is achieved by existential quantification over the representation type: an abstract 
data structure consists of operations operating on an internal state, whose type 
is hidden from everything except those operations. Mitchell and Plotkin [32] 
expressed this view in the slogan "abstract types have existential type" . 

2.1 An example: complex numbers 

For example, consider (a simplification of) the Haskell datatype Complex: 

data Complex = MkComplex Double Double 

This introduces a constructor MkComplex :: Double — + Double — ► Complex. The 
outcome is not an abstract datatype, because the representation as a pair of 
Doubles (as Cartesian coordinates, as it happens) is visible. Instead, it is a 
concrete datatype. This provides other advantages — such as pattern matching — 
but loses the benefit of information hiding. For example, to determine whether a 
complex number is real, we can use the following function: 

isReal :: Complex — > Bool 

isReal (MkComplex x y) = (y == 0.0) 

But if we decide to change the representation to polar coordinates, all such 
definitions will need modification. 

An abstract datatype of complex numbers should hide the representation, as 
follows [43]. 

data Complex = 3s. C (s — > (Double, Double) — > s) — create 
(s — > Complex — > s) — add 

(s — > Double) — real 

(s — > Double) — imaginary 

s — self 
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As above, this introduces a new constructor C ; this takes four functions and 
an internal representation or 'self, and yields a Complex. Here, the self is of 
type s, for some s; the four functions each take an argument of type s. Note 
that the type variable s does not appear on the left-hand side of the datatype 
declaration, so it has to be quantified somehow. A universal quantification 
would be inappropriate, because the representation is of some type, not any 
type; existential quantification is what is required. (In common extensions to 
Standard Haskell supporting existential quantification, somewhat perversely, it is 
written with the keyword f orall [39] — the justification being that a datatype 
declaration such as 

data D = 3s. MkD (s, s — > Integer) 

introduces a constructor MkD :: (3s. (s,s — ► Integer)) — » D, and this type is 
isomorphic to Vs. ((s, s — > Integer) — > £>) because -D is independent of s — but 
in this paper we will pretty-print that keyword as '3'.) 

Packaged up with the internal representation of a complex number are four 
functions: for creating a new complex number, adding on a second complex 
number (obtaining an updated representation), and extracting the real and 
imaginary components. These can be given more user-friendly names: 

new :: Complex — > Double — > Double — > Complex 
new (C naris)xy — C n a r i (n s (x, y)) 
add :: Complex — > Complex — > Complex 
add (C n a r i s) c — Cnari(asc) 

rea, ima :: Complex — ► Double 
rea (C n a r i s) = r s 
ima (C n a r i s) = i s 

Crucially, nothing other than these four functions can access the internal repre- 
sentation; that is guaranteed by the quantification over the type variable s. In 
particular, there is no way to extract the representation itself. So of course, the 
set of operations made available has to be considered carefully; in contrast to 
concrete datatypes, which support pattern matching and hence easy extension 
with new functions, adding a new operation to an abstract datatype inexpressible 
in terms of existing functions requires a change to the datatype definition [8] . 

The abstract datatype specifies a signature, but not an implementation. Here 
is one implementation, in the expected Cartesian coordinates: 

zeroC :: Complex 

zeroC = C (\(x, y) — > Xz — > z) 

(A(:r, y) — > Ac — > (x + rea c,y + ima c)) 

(X(x, y) -> x) 

(X(x, y) -> ?/) 

(0.0,0.0) 

Notice the type s — > Complex — > s for the addition function, allowing 
complex numbers of different representations to be added. Consequently, the 
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implementation of add has privileged access to the representation of the first 
argument (through the fields x and y), but only public access to the second 
argument (through the operations rea and ima). This inefficiency is a well-known 
problem with binary methods in object orientation [4]. Mitchell and Plotkin's 
approach [32] differs, providing privileged access to the representations of both 
arguments of add but therefore requiring both arguments to have the same 
representation; their alternative is more efficient, but less flexible. The difference 
is essentially a matter of whether the scope of the existential quantification is 
narrowed down to specific objects, or widened out to the whole program. 

Note also the rather odd type Complex — > Double — ► Double — ► Complex for 
new, requiring an existing complex number before a new one can be created. 
Of course, the implementation of an abstract data structure has to come from 
somewhere; the function new creates a new structure using the operations and 
data representation of an existing structure, simply assigning a new state. This 
is more analogous to cloning in prototype-based languages [54] than it is to 
construction de novo in more traditional object-oriented programming. 



2.2 An alternative implementation 

Here is an alternative implementation of complex numbers, using polar coordi- 
nates. 

data Polar = P{mag :: Double, phase :: Double} 

zeroP :: Complex 

zeroP = C (Xp — > Xz — > c2p z) 

(Xp — > Ac — > let (x, y) — p2c p in c2p (x + rea c,y + ima c)) 

(Xp->fst (p2cp)) 

(Xp — > snd (p2c p)) 

(P{mag = 0.0, phase = 0.0}) 

p2c p = (mag p x cos (phase p), mag p x sin (phase p)) 

c2p (x,y) = P{mag = sqrt (1x1 + yxy), phase = atan2 y x} 

(A wiser definition of c2p would scale the two coordinates before multiplying, 
to avoid overflows; but the naive version above is clearer.) Note that although 
zeroC and zeroP have different representations, the existential quantification 
allows them to be of the same type. 

Now, the reality check for complex numbers becomes: 

isReal :: Complex — ► Bool 
isReal z = (ima z == 0) 



We can no longer use pattern matching on the representation; but this definition 
works just as well for a polar — or indeed, any other — representation as for 
Cartesian. 
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2.3 An explicit signature for complex numbers 

The type declaration Complex is rather complicated, on account of the number of 
operations provided. Moreover, those operations all have a common domain, the 
hidden representation type; so they can be coalesced into one function returning 
a tuple. We will adopt a convention of separating out the description of the 
signature (that is, the number and types of the operations) from the existential 
quantification, writing the following mutually recursive definitions instead. 

data ComplexF s = CF{_new :: (Double, Double) — » s, 

.add :: Complex — > s, 
_rea :: Double, 
Jma :: Double } 

data Complex = 3s. C (s — > ComplexF s) s 

Here is the Cartesian implementation of zero: 

zeroC :: Complex 

zeroC = Cfc (0.0,0.0) where 

fc :: (Double, Double) — > ComplexF (Double, Double) 
fc = (X(x, y) — > CF{_new — Xz — > z, 

_add = Ac — > (x + rea c,y + ima c), 
jrea = x, 
-ima = y }) 

Note that the function fc is analogous to a class in object-oriented terms: it 
determines the data representation, and provides implementations of the methods 
on that representation. We can provide user-friendly wrappers rea, ima, add, new 
as before. Similarly for the polar implementation zeroP: 

zeroP :: Complex 

zeroP = C fp (P{mag = 0.0, phase = 0.0}) where 
fp :: Polar — > ComplexF Polar 
fp = (Xp — > CF{_new = Xz — > c2p z, 

_add = Ac — > let (x, y) = p2c p in 

c2p (x + rea c,y + ima c), 
jrea = fst (p2c p), 
Jma — snd (p2cp)}) 



2.4 Abstract datatype genericity 

Of course, there is nothing special about the particular datatype of complex 
numbers; the approach generalises very nicely. This leads to datatype- generic 
abstract datatypes, parametrised by the signature [14]. The signature should be 
a strictly positive functor: it should be functorial in the state type (in order 
to allow the definition of the unfold for the final coalgebra semantics), and no 
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occurrences of the type parameter may appear to the left of an arrow (to maintain 
the encapsulation of the hidden state). The abstract datatype itself packages up 
some hidden state with the operations, of type specified by the signature. 

data Functor f =>■ ADT f = 3s. D (s -> f s) s 

(The Haskell type class context ' Functor /' entails an operation fmap of type 
(a — ► b) — ► (/ a — * / &).) Instantiating the signature parameter to ComplexF 
yields complex numbers: 

type Complex = ADT ComplexF 

zeroCG, zeroPG :: Complex 

zeroCG = D fc (0.0,0.0) 

zeroPG = D fp (P{mag = 0.0, phase = 0.0}) 

(Note that ComplexF and Complex are mutually recursive, so this redefinition 
of Complex entails also a redefinition of ComplexF .) 

3 Data and codata 

Abstract datatypes are inhabited by codata, as opposed to the ordinary data 
inhabiting the more familiar algebraic datatypes. In general, codata is manipu- 
lated through destructors instead of constructors. Kieburtz [27] identifies some 
fundamental respects in which codata differs from data: 

— Codata structures have hidden representations, accessible only via operations 
specifically provided for this purpose; whereas the representations of data 
structures are visible, for example through pattern matching. 

— Consequently, standard datatype-generic operations such as pretty-printing 
and comparison, automatically defined or easily derived from the structure 
of an arbitrary datatype, are generally not applicable to codatatypes. 

— Codata is typically infinite, since it may provide operations that yield other 
instances of the same type, as with the new and add operations in the 
complex number example in Section 2 (but records with field extractors 
are an exception to this rule). Data is usually finite (although with lazy 
evaluation, infinite recursive algebraic data structures can be constructed; 
we see an example shortly). 

As it happens, codatatypes are generally greatest fixpoints of recursive type 
equations, whereas datatypes are least fixpoints. In some settings (such as that of 
continuous functions between complete partial orders, as embodied in Haskell for 
example), least and greatest type fixpoints coincide; but in many settings (such 
as total functions between sets, as used in Cockett's Charity [7] and Turner's 
Total Functional Programming [52,53]) the two are distinguished. 
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3.1 Greatest fixpoint types as codata 

Given a suitable operation F on types (technically, a covariant functor; but for 
simplicity, think of some combination of sums and products) , the least fixpoint 
fi(F) of F is the smallest type X such that F (X) as X. This corresponds to the 
algebraic datatype declaration 

data Muf = In{ out :: f (Mu f) } 

when read in terms of total functions between sets (rather than continuous 
functions between complete partial orders), capturing just the total and finite 
data structures of a given shape. The constructor In :: / (Muf) — ► Muf and 
destructor out :: Mu f — > / (Mu f) are the witnesses to the isomorphism. 

There is a well-known technique called Church encoding [3,19] for representing 
such least fixpoint or initial recursive datatypes in the polymorphic lambda 
calculus, without having to introduce new language constructs like data and 
pattern matching. The encoding is as a higher-order functional type: 

fi(F) = VX. (F (X) -> X) -> X 

(Technically, strong initiality, conferring also a corresponding proof principle, 
requires additional assumptions, such as parametricity or "theorems for free" in 
the underlying category [46,57].) 

For example, integer lists have shape determined by L (X) = 1 + Integer x X, 
where 1 denotes the unit type with a single element, Integer the type of integers, 
+ disjoint union, and x Cartesian product. Integer lists therefore have the Church 
encoding n(L) — VX. (L (X) — > X) — > X. Note that, using standard type 
isomorphisms, we have L (X) — » X « X x (Integer x X — > X), so an equivalent 
definition is fJ,(L) = VX. (X x (Integer x X — > X)) — > X. Moreover, similar type 
isomorphisms yield a type Va. [a] — ► V6. (6, (a, 6) — > 6) — * 6 for the function 
/o/dr from the Haskell standard library [40]. In other words, when specialised to 
a = Integer, foldr computes the Church encoding of a list of integers. 

What is rather less well known is that this encoding dualises, allowing the 
representation also of greatest fixpoint or final recursive types [62,58]: 

v(F) = 3X. (X —> F (X)) x X 

(Again, parametricity is required in order to deduce strong finality, conferring 
the corresponding proof principle). For example, the type of finite and infinite 
integer lists, the greatest fixpoint v(L) of the functor L, is encoded as 3X. (X — > 
1 + Integer x X) x X. Moreover, standard type isomorphisms yield a type 
Va. (3b. (b — > Maybe (a, &),&))—> [a] for the function unfoldr from the Haskell 
standard library [40]. In other words, when specialised to a = Integer, unfoldr 
computes a co-list of integers from its co-Church encoding. 

In summary, whereas least fixpoint types correspond to universal type quan- 
tifications, greatest fixpoint types correspond to existential quantifications. 
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3.2 Proof methods for codata 

So, zeroCG and zeroPG are both elements of a greatest fixpoint type. We might 
expect them to be 'equal' in some sense, since they both represent 'the same' 
complex number. But in what sense could they be equal? They have different 
representations, so straightforward structural comparisons are inappropriate. 

The generally accepted approach to take to equality on codata, such as 
between instances of abstract datatypes, is observational equivalence, or "equality 
as far as we can see" [24]. Two instances of an abstract datatype are clearly 
different if there is an experiment — that is, a sequence of operations provided by 
the signature — yielding distinguishable concrete outputs (which might without 
loss of generality be as primitive as bits, but we take here to include types like 
Integer and Double); and conversely, if no such experiment exists, we consider 
the two instances to be equal. 

That informal characterisation of observational equivalence can be formalised 
in two ways, which turn out to be equivalent: via bisimulation and coinduction 
[36,21,25] or via universal properties of final coalgebras [59,26,45,23]. In this 
context, bisimulation amounts to the same thing as logical relations and relational 
parametricity [31,44]. Bisimulation and universal properties are compared in a 
recent survey paper [15]. That survey applies the techniques to proving equality 
of concrete datatypes, specifically streams, for which structural comparisons are 
also available. It therefore takes the structural comparison as the definition of 
equality, and proves that the other notions coincide with it. Since the present 
paper concentrates on abstract datatypes, structural comparison is unavailable; 
but these two notions of observational equivalence still agree with each other. 

3.3 Final coalgebras 

Here, we take the final coalgebra approach. The single experimental steps available 
on an abstract data structure of type ADT f are captured, with their required 
inputs and specified outputs, by the signature functor /. The tree of all possible 
experiments is obtained by repeatedly applying these operations. 

data Treef = T{unT :: / (Tree /)} 

tree : : Functor f =>■ AD T f —> Tree f 
tree (D h s) = unfold h s where 

unfold :: Functor f => (a — > f a) — > a —t Tree f 

unfold f x = T (fmap (unfold f) (f x)) 

As we noted above, because some experimental steps may yield new abstract 
data structures, observation trees will typically be infinite; accordingly, Tree f is 
the greatest fixpoint of /. Nevertheless, we consider Tree to be a type of data 
rather than of codata: it is amenable to pattern matching and to structural 
datatype-generic operations such as pretty-printing and comparison. Although 
in Haskell Mu f and Tree f coincide semantically, we use different datatypes to 
reinforce the distinction between least and greatest fixpoints. 
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Now, the claim that the abstract data structures zeroCG and zeroPG are 
observationally equivalent reduces to a more amenable statement that the concrete 
data structures tree zeroCG and tree zeroPG are structurally equal: an experiment 
distinguishing zeroCG and zeroPG corresponds to a difference between the two 
trees, and the absence of such an experiment implies the equality of those trees. 
The claim is still not effectively decidable, because the trees are both infinitely 
deep and infinitely wide; but at least it is now open to proof via familiar equational 
reasoning at the meta-level. 

3.4 Proving equivalence 

Observational equivalence of the complex numbers zeroCG and zeroPG follows 
from structural equality of their observation trees tree zeroCG and tree zeroPG, 
which can be demonstrated using the universal property of unfold: 

h = unfold f unT ■ h = fmap h ■ f 

We have 

tree zeroCG = tree zeroPG unfold fc (0.0, 0.0) = unfold fp (P 0.0 0.0) 

But it isn't immediately obvious how to apply the universal property here: this is 
not an equation between two functions of the form unfold h, but rather between 
two trees of the form unfold h s . How can we move forward? 

Fortunately, this is a somewhat special case, because there is a simulation re- 
lationship between the two instances. Specifically, we can abstract from the initial 
state (0.0, 0.0) of the Cartesian implementation, since (0.0, 0.0) = p2c (P 0.0 0.0), 
obtaining the proof obligation unfold fc ■ p2c = unfold fp. This equation can be 
proved using the fusion law of unfold, a simple corollary of the universal property: 

unfold f ■ g = unfold f <= f ■ g = fmap g ■ f 

All that remains is to establish the premise, fc ■ p2c — fmap p2c ■ fp — that is, 
that p2c is the abstraction function relating fc and fp. The only property of 
complex numbers required in the proof is that p2c ■ c2p = id. The calculation 
can be found in an appendix (Section 6). 

We chose here to abstract from the initial state of the Cartesian implemen- 
tation of the abstract datatype, effectively expressing that implementation in 
terms of the polar representation. This particular proof of equivalence is doubly 
special, because the simulation also works the other way around: we could have 
abstracted the initial state P 0.0 0.0 of the polar implementation instead. (The 
only complication in doing so is that c2p ■ p2c is not quite the identity function; 
however, it is the identity on the reachable states — those with non-negative 
magnitude, phase between 0 and 2n, and zero phase if zero magnitude.) 

In general, given two implementations of an abstract datatype, neither will 
simulate the other; instead, each introduces extensions inexpressible by the 
other. In that case, each can be shown observationally equivalent to a third 
implementation that can simulate both. We will see an example in Section 4.8. 
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4 Stream fusion 

Coutts et al. [9] present an elegant technique for obtaining better fusion of 
list functions, by reimplementing the Standard Haskell list library [40] to use 
internally an abstract datatype (of 'streams') rather than the familiar algebraic 
datatype of lists; we summarise work in Sections 4.1-4.4 and 4.7 below. However, 
they don't prove that their reimplementation is sound; we present such a proof 
in the remainder of this section. 

4.1 An abstract datatype of streams 

A simplistic version of Coutts et al.'s approach uses a curried version of Haskell's 
Maybe datatype on pairs as the signature of the stream abstract datatype: 

data Maybe 2 a b — Nothing 2 | Jms<2 a b 
type Stream a — ADT (Maybe 2 a) 

Thus, a stream has two components, qualified by some existentially bound state 
type s: an internal state of type s, and a body that when applied to such a 
state yields either a head and a new state, or nothing. (In other words, the 
greatest-fixpoint or co-Church encoding is being used.) For example, here is one 
implementation of the string "abc": 

abc — D h 0 where 
h i = case i of 

0 Just 2 'a' 1 

1 -> Just 2 'b' 2 

2 -> Just 2 ' c ' 3 

3 — > Nothing 2 

The approach can be seen as an implementation of the Iterator design pattern 
from object-oriented programming [13]. The full story of the approach permits a 
third outcome of the stream body, to deal with nested recursions; we return to 
this point in Section 4.7 below. 

4.2 Stream operations 

The list library is redefined in terms of streams. For example, the map function 
on lists is reimplemented as follows: 

mapS f (D h s) — D hi s where 

h' s = case h s of Nothing 2 — * Nothing 2 

Just2 is'^ Just 2 (/ x) s' 

This function is a rather special case, because the internal representation of the 
stream mapS f xs is the same as that of xs. In general, internal representations 
change; for example, the zip function is: 
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zipS :: Stream a — > Stream b — > Stream (a, b) 
zipS (D h s) {D j t) = D k (s, t) where 
k {s, t) = case (h s,j t) of 

{Just 2 x s' , Just?, y t') — ► Just 2 {x, y) (s' , t') 

_ — * Nothing 2 

The internal representation (s, t) of the result is of a different type to the 
representations s and t of the two arguments. 

4.3 A list interface 

The standard library is reimplemented using streams, but the interface presented 
to the programmer still uses the familiar algebraic datatype of lists; therefore, 
conversion functions stream :: [a] — ► Stream a and unstream :: Stream a — > [a] 
are needed. 

stream :: [a] — » Stream a 
stream xs — D uncons xs where 
uncons :: [a] — > Maybe 2 a [a] 

uncons xs = if n«// xs then Nothing 2 else Just 2 {head xs) {tail xs) 

unstream :: Stream a — > [a] 
unstream (D h s) = unfoldr h s where 
unfoldr :: (6 — * Maybe 2 a b) — > & — * [a] 

unfoldr f y = case / ?/ of Nothing 2 — * []; Just 2 x y — > x : unfoldr f y' 
For example, the familiar map on lists is retrieved by 

map / = unstream ■ mapS f ■ stream 
(In fact, unstream is essentially a specialisation of iree.) 

4.4 Eliminating conversions 

The crucial point in Coutts e£ a/.'s work is the elimination of redundant conver- 
sions in the composition of list operations, such as in the composition of two 
maps: 

unstream ■ mapS f ■ stream ■ unstream ■ mapS g ■ stream 

Here, the double conversion stream ■ unstream from streams to lists and back 
again is redundant. If it could be eliminated, the two occurrences of mapS would 
become adjacent; and because the definition of mapS is non-recursive, standard 
compiler optimisations — specifically, a case-of-case optimisation — can relatively 
easily fuse their bodies. The actual elimination of stream ■ unstream itself is easy, 
using the Glasgow Haskell Compiler's programmer-definable rewrite rules [42]. 

In fact, stream ■ unstream is not quite the identity: stream {unstream _L) equals 
D uncons _L rather than _L. In practice, this difference does not cause a problem 
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if (a) the Stream datatype is not exported from the library, and (b) the library 
itself does not construct bottom values of type Stream; so their implementation 
is carefully arranged to satisfy these conditions. 

Coutts et al. do claim (implicitly) that stream (unstream (D h s)) — D h s, 
but provide no proof of this claim; they say that "it is not entirely trivial to 
define a useful equivalence relation on streams [. . . ] due to the fact that a single 
list can be modeled by infinitely many streams" [9, p320]. 

4.5 Destroying streams 

In fact, there is a simple proof of the stream ■ unstream identity: it is an instance 
of the destroy / unfoldr rule [48], the dual of the better-known foldr/ build rule 
[18]. The function destroy is defined as follows: 

destroy :: (V&. (b — * Maybe 2 a b) — > b — > c) — * [a] — > c 
destroy g — g uncons 

so that stream xs = destroy D xs. Then the destroy / unfoldr rule states that 

destroy g (unfoldr f s) = g f s 

The proof of this rule is a straightforward application of Reynolds' parametricity 
[47] : the free theorem [57] of the type of the argument g of destroy is 



Letting ip = uncons and / = unfoldr ef> gives the destroy / unfoldr rule. 

We note in passing that whereas foldr /build fusion has turned out to be 
a little disappointing in its applicability [33], destroy / unfoldr fusion seems to 
have a much wider scope. For example, it is straightforward to apply the latter 
technique to zip- like functions and functions with accumulating parameters [48], 
avoiding the need for augment-like generalisations [17]. This additional promise 
lends weight to our advocacy for greater appreciation of corecursive programming 



It is quite natural, so to speak, that equivalence proofs for abstract datatypes 
boil down to applications of parametricity. Intuitively, parametricity results 
capture "the only thing you can do, for type reasons". For example, for the 
abstract data structure D h s, the type of the representation s is hidden, and so 
"the only thing you can do, for type reasons", is to apply h to s. Indeed, Reynolds 
[46] originally called his result the "representation theorem" , and motivated it 
by appeal to independence from a choice of representation: "We expect that the 
meaning of [. . . ] a program will remain unchanged if the [definition of an abstract 
datatype] is altered by changing the representation of the type and redefining its 
primitive operations in a consistent manner" [46]. 




[16]. 
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4.6 Unfolding observations 

Another way of looking at Coutts et al. 's problem is to remember that streams 
are an abstract datatype, and so structural equivalence is the wrong tool to use; 
observational equivalence is what is needed. 
In this case, since we have by definition that 

stream (unstream (D h s)) = D uncons (unfoldr h s) 

it suffices to show observational equivalence of D h s and D uncons (unfoldr h s). 
(Notice that these two streams will generally have different representations. The 
latter necessarily uses a list for the state, whereas the former may have an 
arbitrary state type.) 

As we argued in Section 3.3, observational equivalence of abstract data struc- 
tures is just structural equivalence of their unfoldings to the final coalgebra. For 
datatype Stream a, the signature is the functor Maybe 2 a, whose final coalgebra 
is possibly infinite lists of as, and the operation to build such a list is the familiar 
but underappreciated unfoldr [16]. So we have to prove 

unfoldr uncons (unfoldr h s) — unfoldr h s 

This follows easily from the universal property 

h = unfoldr f uncons ■ h = fmap h ■ f 

of unfoldr. This alternative proof using the universal property of unfold is 
important: as we shall see in Section 4.8, it seems to generalise better than the 
parametricity-based proof underlying destroy / unfoldr . 

4.7 Streams that skip 

The simple version above of Coutts et ai.'s story uses a representation of streams 
providing a single observation, yielding either no information (for an empty 
stream) or a head and the state for a tail (for a non-empty stream). In fact, the 
complete story is more sophisticated, allowing a third outcome: a new state, but 
no head. 

data Step a s = Done | Yield a s \ Skip s 
type SStream a = ADT (Step a) 

The extra outcome is needed to support operations such as filtering, which do 
not produce an element at every step — when the filter discards an element from 
an underlying stream, or that stream skips itself, then the outer stream skips 
instead of yielding: 

filterS :: (a — > Bool) — > SStream a — > SStream a 
filterS p (D h s) = D (try h p) s where 
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try h p s = case h s of 

Done — ► Done 
Skip s' — ► S'fcip s' 

Yield x s' — ► if p 2; then YzeZd a; s' else 57sip s' 

Without the possibility of skipping, the body of the stream would have to be 
recursive, thereby complicating fusion optimisations. 

The conversions from lists is very similar to the simple case: 

sstream :: [a] — > SStream a 
sstream xs = D unconsS xs where 

unconsS :: [a] — ► Step a [a] 

unconsS [] = Done 

unconsS [x : xs) — Yield x xs 

The conversion back to lists is more involved, because of the need to handle skips: 

unsstream :: SStream a — > [a] 
unsstream (D h s) = unfoldr (force h) s where 
force :: (s — > Step a s) — > (s — * Maybe 2 a s) 
force h s = case h s ot 
Done — * Nothing 2 
Yield x s' — > Just 2 x s' 
Skip s' —* force h s' 

Note that the body /orce of the unfold here is recursive, so it would be difficult 
for standard compiler optimisations to fuse a following function. That is not a 
big problem, because unsstream is intended to be used only when leaving the 
improved implementation of the list library, when fusion is not expected anyway. 
Moreover, note that unsstream may be unproductive, although for example even 
filterS (const False) is always productive: that particular lump in the carpet has 
been shuffled under the furniture, but no library reimplementation can eliminate 
it altogether. 

4.8 Reasoning with skips 

The presence of skips has interesting consequences for proofs. We should no longer 
take pure observational equivalence as the appropriate notion of equality on 
skipping streams, because we ought to treat some observationally distinguishable 
skipping streams as effectively equivalent. Coutts et al. say that "equivalence 
on streams should be defined modulo Skip values [. . . ] semantics should not be 
affected by the presence or absence of Skip values" [9, p320]. 

For example, consider the stream version of the standard list function concat: 

concatS :: SStream (SStream a) — > SStream a 
concatS (D hs ss) — D he (Nothing, ss) where 
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he (Nothing, ss) = case hs ss of 

Done — > Done 

Skip ss' — > Skip (Nothing, ss') 

Yield s ss' — > Skip (Just s, ss') 
ftc (Jwsi (D ha sa), ss) = case ha sa of 

Done — ► S'fcip (Nothing, ss) 

Skip sa' — ► Skip (Just (D ha sa'), ss) 

Yield y sa' — > YieZd J/ (Just (£> /ia sa'), ss) 

In order to maintain a non-recursive body, this uses a rather complex internal 
state consisting of an optional SStream a and the internal state of a remaining 
SStream (SStream a); only if the former is present and yielding does the whole 
yield. We might expect the following property — one of the monad laws for 
streams — to hold: 

concatS ■ wrapS = id 

where wrapS wraps an element up as a singleton stream: 

wrapS :: a — > SStream a 

wrapS x — D fetch (Just x) where 

fetch :: Maybe a — » Step a (Maybe a) 

fetch (Just x) = Yield x Nothing 

fetch Nothing = Done 

The two sides of the property are not even observationally equivalent, because 
the left-hand side concatS ■ wrapS introduces quite a few extra Skips. 

In fact, the appropriate notion of "equivalence modulo Skips" is obtained 
precisely by taking structural equality on their unfoldings to lists: 

unsstream ■ concatS ■ wrapS = unsstream ■ id 

The proof of this latter property is a fairly straightforward (albeit somewhat 
tedious) application of the universal property of unfoldr; it is relegated to an 
appendix (Section 7). 

The alternative proof technique in terms of the universal property of unfoldr 
is important, because the destroy / unfoldr rule used in Section 4.5 does not seem 
to generalise nicely to skipping streams. The analogous development would be to 
introduce a function 

destroyS :: (Vo. (& — > Step a b) — * b — * c) — > [a] — * c 
destroyS g — g unconsS 

so that sstream xs = destroyS D xs. The free theorem of the type of the argu- 
ment g of destroyS is 

ip ■ f = fmap f -<f> => gtp-f = g <j> 

but it isn't clear how to instantiate this equation to obtain the desired result; 
indeed, 'equivalence modulo Skips' feels more ad hoc than parametric. 
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5 Conclusions 

5.1 Related work 

Data abstraction has long been recognised as a crucial tool in managing the 
complexity of software systems [37,30]. Pattern matching on algebraic datatypes 
is also widely appreciated as an extremely convenient technique [55]. But as 
Wadler's proposal [56] noted twenty years ago, it is difficult to marry the two 
together: data abstraction depends on hiding a data representation that pattern 
matching relies on revealing. There have been numerous other proposals for 
combining data abstraction with pattern matching over the years [5,35,6,34,51], 
and indeed a recent flurry of activity in the area [49,11,41,61]. 

One could look at final coalgebra semantics as a disciplined way of thinking 
about pattern matching over abstract datatypes. Rather than trying to force 
these two somewhat conflicting ideas together, one could instead define a view of 
codata (supporting abstraction) as data (supporting pattern matching), using 
the function tree from Section 3.3. In case a full transformation from 'completely 
codata' to 'completely data' is inappropriate, simply apply the body of the 
abstract datatype once: 

unpack :: Functor f ADT f -»■ / (ADT /) 
unpack (D h s) = fmap (D h) (h s) 

This yields a piece of data (the outermost type constructor /) with codata 
as components (the inner occurrences of ADT f). This construction justifies a 
number of earlier attempts to treat algebraic datatypes abstractly [50,12,60,38]. 

The idea of using final coalgebras as the semantics of abstract datatypes has a 
long history. Wand [59] writes that "an abstract data type is a final object in the 
category of its representations" , and Kamin [26] that "only externally observable 
behavior matters [. . . ] the final data type is the most abstract realization of any 
given data abstraction." Considering how close the relationship between abstract 
datatypes and object-oriented classes is, it is surprising that it seems to have 
taken over a decade for the idea to arise that final coalgebras provide a semantics 
for classes too [45,23]. For a good historical review of coinduction for behavioural 
satisfaction, see [20]. 

5.2 Summary 

We have presented an approach to reasoning about abstract datatypes in a 
functional language, based on the (well-known) model of abstraction through 
existential quantification over the hidden representation type [32,28,29], and 
the (somewhat less well-known and appreciated) reasoning principles for codata 
through universal properties of final coalgebras [16,15]. We have illustrated this 
approach by considering a problem arising from Coutts et aVs work [9] on stream 
fusion. In a nutshell, we advocate the following steps for reasoning about abstract 
datatypes: 
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— express the signature as a (strictly positive) functor /; 

— enforce the abstraction via existential quantification: 

data Functor f => ADT f = 3s. D (s -> f s) s 

— capture observations as concrete data: 

data Tree f = T{ unT :: / ( Tree f) } 

— transform abstract data to concrete data: 

tree :: Functor f ADT f — > Tree f 

tree (D h s) = unfold h s where 

unfold :: Functor f => (a — * / a) — * a — ► Tree f 
unfold f x — T (finap (unfold f) (f x)) 

— exploit the universal property of unfold for reasoning: 

h = unfold f unT ■ h = fmap h ■ f 

— view data as a mixture of concrete and abstract: 

unpack :: Functor f => ADT f -> / (ADT f) 
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6 Appendix: Equivalence of complex numbers 

Section 3.4 shows how to prove observational equivalence of different implementa- 
tions of complex numbers, using the fusion property of unfold. Here, we discharge 
the proof obligation 



fc ■ p2c = fmap p2c ■ fp 
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using the (idealised, since not quite valid in approximate floating point numbers) 
equation relating polar and Cartesian representations 

p2c ■ c2p = id 

and the definitions 



fc{x,y) = 



CF{ 


.new 


= Xz- 


> z, 




.add 


= Ac - 


* (x + re c,y + im c), 




_rea 


= x, 






Jma 


= y}) 




CF{ 


.new 


= Xz- 


» c2p z, 




.add 


= Ac - 


* let (x, y) = p2c p in 



c2p (x + re c,y + im c), 
jrea = fst (p2c p), 
Jma = snd (p2c p)}) 

Note also that the appropriate definition of fmap on ComplexF is 

fmap f c = CF{_new = Xz — > / (_new c z), 
.add = Ac' — ► / (_add c c'), 
jrea = jrea c, 
Jma = Jma c} 

We calculate: 

fmap p2c (fp p) 

= { fP } 
fmap p2c (CF{_new = Xz — > c2p z, 

_add = Ac — > let (a;, y) = p£c p in 

c2p (a; + re c, ?/ + c), 
_rea = /st (p2c p), 
_jma = sn^ (p£c p) }) 
= { /map on ComplexF -} 
CF{_new = Xz — > p2c (cgp z), 

_arfrf = Ac — > let (a;, y) = p2c p in 

p2c (c§p (i + re c, 1/ + im c)), 
_rea = /st (p2c p), 
_ima = snd (p£c p) } 
= { p2c • c2p = id -} 
CF{_new = Xz — > z, 

_arfrf = Ac — > let (a;, y) = p2c p in 
(x + re c,y + im c), 
_rea = fst (p2c p), 
Jma = snd (jp2c p) } 
= { lift out the let binding -} 
let (x,y) — p2c p in 



22 Jeremy Gibbons 



CF{jnew = Xz — > z, 

_add = Ac —>■ (x + re c, y + im c), 
_rea = x, 
-ima = y } 

= t /c -} 

let (x, y) = p2c p in/c (x, y) 
= { application -} 

fc (p2c p) 

which completes the proof. 

7 Appendix: Equivalence of skipping streams 

Section 4.8 makes a claim about the observational equivalence modulo Skips of 
the skipping streams concatS (wrapS s) and s, where 

concatS :: SStream (SStream a) — * SStream a 
concatS (D hs ss) — D he (Nothing, ss) where 
he (Nothing, ss) = case hs ss of 

Done — > Done 

Skip ss' — > Skip (Nothing, ss') 

Yield s ss' — > Skip (Just s, ss') 
he (Just (D ha sa), ss) — case ha sa of 

Done Skip (Nothing, ss) 

Skip sa' — * Skip (Just (D ha sa'), ss) 

Yield y sa' — > Yield y (Just (D ha sa'), ss) 

wrapS :: a — > SStream a 

wrapS x = D fetch (Just x) where 

fetch (Just x) = Yield x Nothing 

fetch Nothing = Done 

The claim boils down to the following equation between functions on lists, 
unsstream ■ concatS ■ wrapS — unsstream 

where 

unsstream :: SStream a — * [a] 
unsstream (D h s) = unfoldr (force h) s where 
force h s — case h s of Done — ► Nothing 2 
Yield x s' — > Just 2 x s' 
Skip s' — > /orce /i s' 

Consider for example the skipping stream s = D h 0 where 

h n = [Skip 1, FieW 'a' 2, S'fcip 3, FzeW 'b' 4, ^fcip 5, Done] !! n 
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(The operation '!!' denotes list indexing.) Unwinding this stream proceeds through 
each of the above states in turn, yielding in total the list of characters [ 'a', 'b']. 
The stream concatS (wrapS s), on the other hand, exhibits the following be- 
haviour: 



state 


output 


(Nothing, Just (D h 0)) 


Skip 


(Just (D h 0), Nothing) 


Skip 


(Just (D h 1), Nothing) 


Yield 'a' 


(Just (D h 2), Nothing) 


Skip 


(Just (D h 3), Nothing) 


Yield 'b' 


(Just (D h 4), Nothing) 


Skip 


(Just (D h 5), Nothing) 


Skip 


(Nothing, Nothing) 


Done 



Each row of the table presents a state and the output from that state, omitting 
the successor state if present. For example, 

he (Just (D h I), Nothing) = Yield ' &' (Just (D h 2), Nothing) 

Evidently the closest match between these states and those of the original stream s 
are those whose first component is a Just. We therefore proceed to show that 

unsstream (concatS (wrapS s)) 
= { definitions -} 

unfoldr (force (he fetch)) (Nothing, Just (D h n)) 
— { expanding: f x = f y unfoldr f x = unfoldr f y -} 

unfoldr (force (he fetch)) (Just (S hn), Nothing) 
= { unfoldr fusion -} 

unfoldr (force h) n 
= { definitions -} 

unsstream (D h n) 

For the 'expansion' step, it is easy to verify that 
fetch (Just (D h n)) = Yield (D h n) Nothing 
and so 

he fetch (Nothing, Just (D h n)) = Skip (Just (D h n), Nothing) 
and so 

force (he fetch) (Nothing, Just (D h n)) 
force (he fetch) (Just (D hn), Nothing) 
as required. 



For the 'fusion' step, we define 

inject n — (Just (D h n), Nothing) 
so that the obligation is to prove 

unfoldr (force (fc fetch)) ■ inject = unfoldr (force h) 
This can be done using the fusion rule for unfoldr: 

unfoldr f ■ g = unfoldr f <= f ■ g = fmap (prod id g) ■ /' 
which reduces the obligation to showing that 

force (he fetch) ■ inject = fmap (prod id inject) ■ force h 

This last step has to be done using fixpoint induction, because force is not defined 
using a structured form of recursion. We simplify both sides to the point at which 
they make a recursive call to force; the surrounding contexts turn out to be 
equal, and so fixpoint induction shows that the least fixpoints are equal. On the 
left-hand side, we have: 

force (he fetch) (inject n) 
= { inject -} 

force (he fetch) (Just (D h n), Nothing) 
= { force, he, fetch -} 

case h n of 

Done — > Nothing 2 

Skip m — > force (he fetch) (Just (D h m), Nothing) 
Yield x m — > Just 2 x (Just (D h m), Nothing) 
= { inject -} 
case h n of 

Done — > Nothing 2 

Skip m — > force (he fetch) (inject m) 

Yield x m — > Just 2 x (inject m) 

On the right, we have: 

fmap (prod id inject) (force h n) 
= { force -} 
fmap (prod id inject) (case h n of 
Done — > Nothing 2 
Skip m — ► force h m 
Yield x m — > Just 2 x m) 
= { fmap, prod -} 
case hn of 

Done — > Nothing 2 

Skip m — > /map (prod id inject) (force h m) 
Yield x m — > Jits£ 2 £ (inject m) 

This completes the proof. 



